>> Jennifer P.: Closed captioning is available. Once you open the caption window, you'll see there is an option to adjust the font size or the font style. You want to just go ahead then and click the show/hide header so you are able to see the full caption window. One more reminder about the Q and A panel. That is how you can get your technical questions answered. And if for some reason you end up outside of the room, please note that -- in the email the telephone number as well as the event number for our session today, and the webEX tech support folks can assist you. I'm thrilled to be here with my colleague Kendra Morgan. We are here to support you as today's producers. And I would also like to welcome you on behalf of web junction. We are thrilled always to be able to gather together in these virtual space and look forward to seeing you on webjunction.org/rural where many, many of us are gathering to share resources, conversation, and learning. And I just wanted to give a special shout out and thanks to webjunction North Carolina as we have our North Carolina colleagues here, Webjunction -- North Carolina hosts a Webjunction North Carolina site for library staff in North Carolina, and there are many, many North Carolinaians on the call by far there are 133 registrants from North Carolina alone. So welcome to all of you from North Carolina. So I'm thrilled to introduce our presenters today. And I'm not going to go into too much of an introduction because they're going to be introducing themselves. But we're excited to have Lisa, Jennifer, and Amy here. Amy presented earlier this year at a symposium on digital preservation, and she was so wise to note that many of the attendees had lots of introductory questions. So she said, my staff and I are working on a workshop on introductions on introducing folks to digital preservation, and we'd like to bring it to the broader community. And of course we jumped at that opportunity and are so thankful to you Amy for thinking of this -- the broader community, and I'm going to go ahead and pass it on over to you and let you get us started. >> Amy: Thank you, Jennifer. It's a pleasure for us to be here today. And to be talking to everyone who's joined us. We are talking about digital preservation and we're hoping that we're doing this discussion today in a pretty basic introductory level. And so for those of you who are working in the field already, I hope that this won't be too basic for you, but for those of you who aren't, it will be a good introduction. And when we were starting this workshop, we decided that since you couldn't see us, we wanted to make sure you had eye candy, and that's the reason for the background that you see there. So the reason that we've titled it "someday they'll thank you" digital preservation is really hard to sell to administration. It's definitely not sexy, and because it's about maintaining data, administration doesn't always get it. So when you're thinking about projects that really sort of make your administrators happy, you know they tend to have obvious outcomes and the fact of the matter is that outcomes of digital preservation probably won't be seen for years or decades, and so that's the title of the presentation. Some day they'll thank you. So before we get started, what I'd like to do is get a sense of where folks are at with this consent of digital preservation. And so I'm asking that you take the poll over -- it's a feedback poll, and it's going to look like that little red check mark in the image on the bottom, and what I want you to answer with a yes or no is whether or not you think that digitization and digital preservation mean the same thing. Answer from the heart. All right. We're getting quite a few answers, and it's pretty exciting to see the numbers. And I see some folks are answering in the chat as well. OK. That's good. If you open up the poll, you'll see that we've got a pretty interesting disparity between yes and no. And you then can get a sense of how we feel as we go along, because we'll be talking about this. Here's what we're going to talk about today. It's a lot of information in two hours. My portion of the presentation is going to introduce you to the basic concept that Lisa and Jennifer will offer more depth about. The plan is for me to plunge into some of the technical stuff, and then hand over the presentation to Lisa, who's going to describe the digital object life cycle and the threats to preserving your digital resources. Jennifer will end on a happy note with her discussion about some of the strategies to thwart or at least prepare yourself for some of these challenges. So as Jennifer Peterson pointed out, we did want to let you know who we were. We all washing for the state Loy bring of North Carolina in the digital information management program. There are three of us represented from the program today, and you can see who we are. Lisa and I are there, and Jennifer is representing herself as a huge fan of N.C. state athletics. That's the wolfpack, or the wolf, or something like that. I'm not a big fan of athletics, so I can't tell you exactly what that is. But that's who we are, and we have combined experience of digital preservation and digital projects management of about 15 years. So this is not new to us, and we are definitely in the trenches working on digital preservation issues. In our work here at the state library, confront many of the challenges, and we're attempting to implement of the of the strategies you'll hear today. What we're presenting in the strategy section is really the ideals. And the basic activities to get you and your institutions thinking smartly about preserving your digital assets. This is something huge. You want to keep it in the back of your head, these are the ideals, this is where we'd like to see you moving in this area, but you may not be able to institute all of these things. So back to the we are, here, we are actually legally mandated to preserve the digital publications for the state of North Carolina. So we really are in the trenches. We're active participants in national organizations, and we speak right and do presentations of this nature about this topic, and some of us are actually trained in digital curation. So I want to jump in now and start talking about digital resources. You're going to hear a lot of terms today. Digital files, digital resources, digital objects. They all refer to the same thing. Those are the basically files that exist, singularly or in combination on the computer, and they could reasonably be described as a digital resource, those include electronic text, so, for example, a word process document or webpage, a series of digital images, so things like reproductions of works of art that are viewable on a computer screen. Databases, these are collections of information that are usually structured that can be searched by a user. Multimedia and layered files, so audio, video, GiF, those things, and even virtual reality files. So visual environments, giving the illusion of three-dimensional space. The information that constitutes these digital resource assist really a machine readable for mat. If you were to view what was behind a section of an image or a scan of the Mona Lisa that was presented on the internet, some image you found on the internet of the Mona Lisa, what you'd see is something like this 1 and 0 pattern. This is called binary notation. And it's the language that computers understand. So why am I telling you this? Because we're going to be talking about bit level preservation, and a little later on I'll delve into this more deeply, but these 1s and 0s are the bits we're talking about when we're talking about bit-level preservation. So another topic or concept that I want to introduce to you is the concept of born digital versus digitized. And there's -- they're important for a couple of reasons that I'll get into, but first I want to explain what the disteenses are. So the first type of resource that I want to introduce for some of you, probably, is born digital. And these are files that are created natively on electronic devices. So any file that you create on your computer, or on your cell phone or digital camera, or audio recorders, that might include tweets, word documents, anything you do on Facebook, all of that information, if it's got a format, a file format, so it's a distinct file is going to be a born-digital resource. And that is in opposition to or perhaps compared to a digitized resource. So in this case, we have resources that were once analog, and were then transferred to a digital for mat. An analog is basically just not -- is a thing that's not made up of those 0s and 1s. So paper documents, photographic materials like slides and prints, glass plate negatives. 3D objects, things of that nature. Cassette tapes. All of these things are analog objects that you might transfer to digital format through scanning or through some other format. Through some other process. So here's an example of digitized resources. So why the disteens? Basically when you're talking about digital preservation, people often forget about those born-digital objects. So we talk about all these scanned files that we create, that we need to preserve, but we don't talk about the websites and the tweets, and the digital photographs that we're creating enemas -- en mass. And these are really, really -- they're thought just around in great numbers, but they're also important pieces of information. So much of the work we do is now create the in Microsoft office, or similar office applications, and that information is going to be just as important as something that was written with a quill pen on a piece of paper hundreds of years ago. Something to consider too are the different types of digital objects may require different care. So, for example, born-digital text are often smaller in size than digitized text. We may need to manage those in a different way. And also the prove nens is often more difficult to track. So we'll talk about this a little bit later. But I want you to keep that in mind. Because part of what we do in digital preservation is track the beginning of -- as far as -- as close to the beginning of a file creation as possible. Just like we do in archiving. In addition to those other disteenses, I think it's important that to differentiate this concept of digitization and digital preservation. Now, it looks like most of you here understand that they are not the same thing. And for those of you who may not be aware of this, we're letting you in on the secret. They aren't. Digitization provides greater access to materials in a lot of cases, which may lead to the decision to preserve those either/or analog and digital files later on. However, digitization creates new digital objects that themselves Corey air -- require preservation, and creates meta data that requires preservation as well. So instead of having a single painting of Mona Lisa to preserve, we now have the painting, the scanned file, and the meta data, that's three things where we just had one before. So really in most cases, digitization is not the same as digital preservation. This is your new man tra. I ask you to consider it, anyway. Maybe it's not your new mantra today, but if you find yourself working in digital preservation, it should definitely be your mantra at that point. OK, so why do we care about digital preservation? I want to give you a couple of quick stories about the -- about some fails, basically, some major digital preservation failures. And they're near disasters, and I'm hoping that by sharing these with you it will be something that you won't experience yourself. The two stories -- the first story is from the British library, and of you probably know this story. There was a census of England made in the -- between 1086 and 1090, and the census was incredibly complete in its accounting of all the property in the kingdom. And it was called the domesday book. They called it this because its authority was so absolute it was like that expected at the last judgment or doomsday. Property changes hands, but updating the massive domesday was not practical, and parts of it remained relevant. So finding the relevant bits for individual judgment was have been daunting. Basically this object was considered the greatest -- one of the greatest records of English history, and it was written by scribes on parchment paper, and it exists to this day. It went back and forth, different people owned it, but it's safely preserved at this point. So then in 1986, a thousand years after this parchment paper document or book, it's actually I think two volumes, it's huge, was created, the BBC in England decided that they were going to create a digital version to commemorate the anniversary. They involved the whole country, from school children that -- they gathered audio clips from farmers, discussing the Chernobyl fallout, all kinds of information. And they stored the data on the 12-inch disks in the image you can see there, and they were viewable using a microcomputer that was a proprietary format called ACORN, and the digital files lasted about 15 years. And it took a team from Leeds University and the Leeds of Michigan to -- the -- and the University of Michigan to make these files work again, and it took until the mid 2000s to do that. It was a near failure, almost all of this data -- it was close to being lost. It took a lot of federal funds from two wealthy nations, well, they were wealthy at the time, to save these files, and they did manage to pull some of -- most of the information out. But it was a near miss. The other story I wanted to tell you about was NASA. And you may have heard some of these horror stories before, this is one in particular about the lunar orbiter, which was a project in 1966 and '67 from NASA, it was five unmanned spacecraft that were sent up to the moon with telescopes that could focus on objects as small as a yard. So three feet. It was -- the cameras were built with these really high-end lenses, and there was an on-board dark room, it was all automated, so images were taken of the moon to map the entire surface, and then they were transmit the back to earth. It was one of the -- it was the program that took the first pictures of earth as a full planet. You may actually recognize this image, which is often shown at 90 degrees to the clockwise, but this is the way it's supposed to be viewed. This was taken by the lunar orbiter. So what happened is, after -- the space race pushed these programs forward, and a lot of the data from earlier programs was kind of set aside. So the NASA archivists began in the 1980 Tom Stephens start collecting the technology from the lunar orbiter program, and at this point it had kind of been stuck in a corner gathering test. So she dprabbed all the analog tapes and saved up the -- saved the imaging hard ware from government surplus, and then after she retired, she took everything with her, stuck it on wooden pellets in her garage, thinking someone might be interested in it. It was about 40 years later, after the program had ended, that two engineers became interested in the project. And they hired technicians out of retirement, and located some of the documentation for the technology that the orbiters had on board, and they were able to working out of a converted McDonald's, in fact, they're still working out of this converted McDonald's at Ames Air Force base, extracted some of the best quality images of the moon available. So there was a lot of federal funding that went toward the project, and we were lucky to get it. They were able -- I think they've been able to extract about 2,000 images? Somebody can correct me on that. But they took -- it took lofts federal funding and now we've got these beautiful images of the moon. And the earth. So I don't know if you have that kind of federal funding at your institution, but I know that we don't. We're lucky to have multiple staff working on these issues. But we definitely don't have the funding to save these projects, save our data if it's lost in this way. So why care? Because both of these rescue efforts were extremely resource intensive, and, like Nancy Evans, if we don't work now to save today's data, it will be lost forever. Because we don't want to be blamed as the librarians and archivist and content creators who caused the digital dark ages. That's probably a phrase you've heard, and it's somewhat of a scare tactic, but I'm all for it, because I think we need to be scared into doing these -- this kind of work. And so I offer these stories to you so that you have some ammunition to take to your administration when you're advocating for digital preservation. It's not sexy like the fancy digital collections that perhaps you're working on or your colleagues are working on, but it's really important work, and you have to keep reminding yourself that someday somebody is going to thank you for it. OK. So I want to give alittle bit of technical background before Lisa and Jennifer jump in here. We're going to need to put on our thinking caps. It's not too technical, but I did want to go back to the concept of binary numbers and bits. Because so much of what we do is based on this concept. As I mentioned, the bits are made of these binary numbers, 1s and 0s, and it's the most basic unit of information on a computer. It's how computers operate. It's the language they understand. There's an on and an off, 0 and 1, digit objects are just a long string of 0s and 1s, and that's how the data exists. Humans don't do so well with these binary numbers. You can see there's a string of 1s and 0s, and that equals that 1582. Those are the same number, just written out in different ways. Binary and decimals. To help us, we group the bits into bytes. So you've heard the term bytes, that's basically made up of eight bits. Eight bits per byte allows for 256 unique combinations of 0s and 1s. And we use those to create ASCII characters. These are the basic characters you see in a text document. And this number is generally sufficient to represent all of the necessary letters, numbers, punctuation, and other extended characters, and ASCII text. That's what they're called. And here's a breakdown of the measurements of bits and bytes. You've heard kilobyte, megabyte, gigabyte, terabyte, and they go on and on from there. You can see the measurements on the other side. 1,024. Generally speaking we round those numbers to 1,000. To the nearest thousand when we're talking about digital size. So we may say two megabytes, but really it's 2,048. We're comfortable rounding to the nearest 1,000, and you should be as well. But it's just -- it's a concept to keep in your mind. As I mentioned, I'm telling you about this because many digital preservation threats occur at the bit level. Here's some terms that relate to this bit level threat, or challenge. So bit rot, bit-level degradation, bit-level corruption, bit checks. Because so many of our digital preservation strategies occur at the bit level, we really need to care about this. And we use different technologies to check those bits. And they're called checksums, or bit checking software, which is a logical name. And basically they allow you to see if any of the 0s and 1s have changed. And even a single bit change can cause a file to run into problems. We're going to talk about that more, you're going to hear the term checksum again, so I wanted to be sure you were at least introduced to it early on. The last concept I want to bring up with you is file extensions. Digital objects. Digital resources are defined by their file extension. So file extensions tell computers how to read, open, and render a digital object. Render just means bring it up on the computer in the appropriate -- so you can see it in the way that it was meant to be seen. The file extensions are the three or four characters that follow the period. You'll reck night .-DOC as a Microsoft word extension, and you, see others there. If a file extension is wrong or missing, your computer is going to get confused. And you may think, that's silly. File extensions, I don't ever do anything with file stungses. But one of the major challenges you run into with digital preservation is that people do play with those file extensions, and you could try this on your own computer, if you change a .-DOC file to a . PDF extension, you're not going to be able to open that file anymore. And so there are actually -- and if you as a digital preservationist received a thousand files and two of them didn't render properly, you're not going to know why that is. So there are actually software tools out there that will identify the file type even when the file extension is incorrect. And so that's -- that's just another concept that I want you to keep in mind. This is an important one in the field of digital preservation. So the key here is really that we don't want to confuse our computers. At a very basic level, that's what digital preservation is about. So that's all you're going to hear from me for now. I'm going to pass the baton off to my colleague Lisa Gregory, who is going to talk to you about the challenges of digital preservation. >> Lisa: Thank you Amy. OK, so let's say you're the only person your institution or regardless, you're only one person, so how do you approach this massive bits that comes into and out of your desk or your institution or department? Digital content is everywhere, but that doesn't mean you can't review it and select the content to preserve just like you do with paper. The first guideline is a thing you can think of first, is look at what it is, not what it looks like or where it lives. For example, there is tons of content now on social media sites like Facebook, linkDin, Flickr, yiewbtd, and on hosted sites people are using in their personal lives and at work, like Google DOCS or slide share. Think first about what is important for you or your institution to preserve before you think about where it is, and tackle that part second. Second guideline is, some of it will need to change at the beginning. So different file formats, there are different file formats, and you can kind of lump them into proprietary or nonproprietary and open file Mohr mats. There are disteens, but to boil it down, for proprietary formats, the code is owned by a company or a person who may have no incentive now or in the future to let anyone else look at the code. Whereas open formats, the code is available to all, and Jennifer will be giving you examples of these files later on in the presentation, but a lot of times institutions or people will decide that having all of the content they want to keep in open format assist a better decision for preservation into the future. So as content comes in, people will change the -- what they get into an open format to assist in preservation. Another thing that you might want to decide to do is normal iesessation, this is kind of tangential to the proprietary, nonproprietary debate. You can decide what is normal or what you would like all of your content to look like, and based on best practices, or your resources that your institution or your mandate, can adjust all those formats as you take them in. So there are different knew ans to the process, but just realize that as things come in, you may not want to take them as is. Another challenge for kind of the idea of what to keep and how long to keep it is that some of it will need to change down the road. And another topic that will be discussed, touched on a little later is migration, which is the process of bringing a file, a digital file to a newer version, or saving it as a different more preservation friendly file format. So sometimes when we take files N. we don't know immediately that it's going to need to change at some point in the future. And down the road we realize in order to keep those files viable for access, you need to migrate it to a newer version. So I guess the bottom line of this slide, or the point I want to dpet across, when you're trying to decide what to keep and how to keep it is that it's a fluid process, and you need to address different issues at different points in the digital object life cycle. This is sort of a helpful idea for me, and a lot of people have kind of latched on to it. A helpful way to think about a digital object, and so it doesn't seem as discrete, or as something that can be ignored, and that is thinking about it as having a life cycle. Which I've kind of represented in this little almost recycle type image here. So a digital object goes through a lot of different changes, just like any human would, there's the creation part, there's the management part, and then there's use and reuse. And you'll see in this diagram that there are little people along every step of the way, and you might be one of those people. In fact, most all of Russ definitely creators at some point. And so these are things to keep in mind, I've given -- I've decided to associate some cliche was each part of this life cycle to help you remember things not to do. So for creation of digital objects, you need to consider the long-term means of the object. And also the long-term needs of your users. And you need to do this from the beginning. So you can't do this on the fly. And that's cliche number one. So here's some things to keep in mind for creation, and I have them listed at the bottom. First, storage. Just like a physical item like a book or a television, or a painting that you hang on the wall, you have assets to manage about digital objects as well, which include storage space, and the environment. So -- rather than do first and ask questions later, keep in mind that whatever content you take in or decide to preserve, you're going to need storage of some sort. And that storage needs to be managed. So that's always a good thing to have in the back of your mind when you create an item. File format. This is something that I've mentioned briefly, and Jennifer is going to go into more discussion about that later on. But there are positive and negatives to different choices you make about which file format you save your file in. And so that's something that should impact your choices at creation. File detail. This is something that a lot of people don't give a second thought to. But in most -- in many different software packages, there's a place for you to associate different file details with your files. So here's an example. I've taken some screen shots of I believe that's Microsoft power point, Microsoft word on a Mac, and that's adoab caption window bat. The properties is sometimes what they're called, or file details, librarians and archivists often talk about this stuff as metadata. And usually if you go in there, you're going to get some different changeable options, things you can specify, like the title, the author, subject, and key words. And then if you look at the screen shot at the bottom, it also has a lot of different file details that you can't directly change, at least in this interface. But there are definitely file details and specifications that you asks an object creator have control over. And you can add those details, and that's often helpful later on either to yourself, if you're managing your own objects, or to whoever ends up managing your objects for you. I didn't quite finish here. Metadata. I've got it in there twice, and know that's -- no, that's not a mistake. That's just to bring home the point that you probably need more metadata than you think. Metadata, for people who aren't geeked up about librarian and archival principles, that means data about data. So those properties that I was discussing just a minute ago, that's metadata. Anything that describes your file and may actually be embedded in the file or associated with it, those are the things -- those are things that can help you out tremendously down the line. So whether you're scanning something, or you're creating something born-digital, like through Microsoft word, it's integral for understanding what a file is later on. And whether that's you or someone else. Finally, creation. Think about future uses. This is something you can't always do, you don't know how someone will use all your files later on, but an example, especially for people who are scanning documents, you have control over those details, things like file resolution, which I know Jennifer will talk on, and things like that can make a difference down the road for different uses. Difference -- For instance, let's say I have scanned a photo and I want to print it out as a four by six-inch document right now, but my boss comes in next year, next year and says he wants it as a four by six-foot banner. And so if you can anticipate some of those needs and what others might possibly use your files for later on, that can help you create objects in a smarter way. Let's move on to the next part. This is digital object life cycle management. All of us to a certain extent as creators, most likely end up managing, whether it's our own data at home, or something at work directly under our control. Some of us are lucky to be able to manage others' documents as well. But you definitely want to make sure all of your eggs aren't in one basket. First thing, backups. Digital preservation isn't the same thing as backups, just like it's not the same thing as digitization. But definitely preservation should always involve backups. And that's your traditional notion of taking a file or taking a specific folder or an entire drive and making sure that there's a copy somewhere else. The other thing right after that that people usually discuss is that those backups ideally should be in multiple locations. And I don't just mean digital locations, I mean if you're lucky enough or able to, in geographic locations. So if you perhaps heard of the LO -- LOCCSS, distributing your content in different places. I'm someone who likes to keep an on thely file structure. And that has sometimes led me to be a little trigger happy on the delete key. And so those extra copies that are distributed around can definitely come in handy. Next when you're managing, you need to think about security and access. And most of us I would imagine are not deep under cover, you don't have to keep things under constant lockdown, but that doesn't mean that everyone should have access to all of your files. Especially people like me, who end of deleting things sometimes. You're not necessarily protecting your stuff from malicious people, but rather from human error and accidental deletion, or changes. Or even just moving things around. So just be smart about who can access your files. That's another component of being a smart manager of digital objects. People shouldn't get offended if they can't get into everything. Often -- authenticity is another thing you'll hear thrown around. Whether or not you can say with confidence that your filing haven't changed since you've received them. And even if since you've created them, if you're managing your own stuff. So there's a whole spectrum of how authentic something needs to be. There are some people, this is something that has to happen to be a good steward of a digital object. And then for a lot of people in government or corporate situations, there are legal ramifications. If they can't answer that question of, is this authentic with a confident yes. And so the big thing about authenticity is document your actions. And this is something that I'll reiterate a little bit on and off, that you will need to make changes to a file, you may need to move things around or change the metadata, but make sure that you're documenting what's happening so that if you ever are called out on an object, and we can say this is where it's been since I've gotten it, and this is what's been done to it. And there are other steps you can take, but those are the basic ideas about ensuring authenticity. Migration, I mentioned this earlier as the process of bringing a file forth in its version or saving a copy as a different more preservation friendly file format, often for access. So that's something else that when you're managing digital objects, good to keep in mind. And typely, documentation. Things won't be perfect. As there's one thing I can stress, you'll have to make compromises. That shouldn't stop you from actually trying to do things in the right way, but there are things you'll wish you had done differently. Through it all, document and someday down the road they'll thank you, or you'll end up thanking yourself if you're like me and a week later you can't remember why you did what you did, and you can go back to that documentation that you so smartly produced. Digital object life cycle used -- use and reuse. Some of this bleeds a little bit over from management. This often focuses on possibility of people other than you reusing your digital objects after you've moved on to a different job, or your children or your children's children. This also includes destruction of digital objects, and long-term retention. And the big thing to keep in mind here is that out of sight, out of mind does not apply. Sometimes there are expectations in where you work that your digital objects should be around forever. Sometimes you'll work in a situation where there's retention schedule, and your business or your government agency says definitely should not be around forever -- make sure you know which is the case. And follow the documentation on those retention schedules. Because that often Goff earns how -- who can access things and for how long. People will want to access and even repurpose your files. When I say repurpose, I'm talking about maybe manipulating data, excerpting different parts of your times, or altering images, and some cases that's not allowed, but in others it is. And again, this goes back to the idea of trying to plan for different scenarios of how people will want to use and reuse and repurpose things. For access and repurposing, make sure you have conditions of use sort of teased out either in your own mind or at a department level, or maybe even an institution level, and not only that you haven't -- have them teased out, but those conditions are available to people who can access your content. It's a lot harder to navigate different things, different repositories when you want to do those things and when you want to reuse objects, if you can't easily find out what you're allowed to do with it. So you want to make sure you have policies in place, and document procedures so that you can have people use and reuse your files throughout the life cycle, and you can stand up and say, yes, you can trust us with this content. And hopefully you want to do that at the beginning, and during the management part, and then also at the end. And if you follow some of these dispeps do the documentation, you have the policies, you can say that, and you can say, we know what it is, what this object is, we know it's been done to it, and you can use it in this way. Not to be a bearer of bad news, but avoiding the digital dark age means knowing threats and being prepared. So we're going to go down in a little bit into the doom and gloom here. Here are some threats to kind of keep in the back of your mind. First, institutional support. Or not. Like the title of our presentation, they'll thank you some day. Well, that day might not be day. You might have to advocate for digital preservation, especially now that budgets are tight. But there are a lot of simple things you can do to combat that, and Jennifer will be talking about that later. And there are a lot of reasons why it's important. I mentioned before that sometimes there's human intervention, either malicious or not malicious. Probably more often not malicious. Accidental human error. There's also human error by loss, like through computers, so let's say you're taking your laptop and you're on vacation, but you're a diligent worker and you're on your boat and the boat -- the laptop goes over the side of the boat. Think about human error as something that can be insidious with your digital objects. Possibility of file corruption. Over time, different processes and on a computer and degree rah education of different media can lead to file corruption, and it's something that if you don't keep on top of your -- on top of your files, and on top of what's going on with them, then that can happen too. Computer failure, I think we've all experienced that blue screen of death, at least heard about it if you're lucky enough not to have experienced it. The hardware or the software can cause you trouble on your computer. Misplaced metadata. I think -- I know I've definitely ended up digging out cassettes or VHS tapes that have no label, and you have no clue what's on them. And I have had that happen in my work and different jobs I've had as well. Or maybe you've had files that are on a shared drive and you don't know who created them or what they were about, or somehow the metadata, the data about your object gets separated from the original object. So that's something that can render an object pretty useless, and definitely put kind of a slow-down or stop on people accessing or repurposing your files. Natural disaster, something that people usually look to first, it's actually not the first thing I typically think of as the most insidious thing that happens with digital objects, but it is something that needs to be mentioned. Especially if you're like me or like in a lot of places I've worked where the CPU, your computer sits on the floor, or your server space is on the basement level, and so flooding and other types of natural disasters, that could impact the different objects that -- the different media that store your objects, definitely something to keep in mind. And then last my favorite -- my least favorite in some ways, is file format obsolescence. And I've stuck the ghost in there because that's something you hear a lot when people talk about digital preservation. And the idea that file formats that we use now or have used in the past become obsolete, and that -- in that they're no longer supported, and people can't access those files anymore. Again, rendering them useless, like Amy mentioned earlier. So here is a little bit more information about some of the challenges. Just to kind of set it up for Jennifer to give you all of the tools that you need to get started with addressing these challenges. First metadata, and you'll hear this refrain throughout this presentation. This really adds value to your data. But different people create data. You name files one way, someone else in your institution names them another way. Different people created over time, and especially if you're collecting born digital files. The other thing is how much do you add? It takes time to add metadata. That's adding value to your documents. I know a lot of times going into that properties takes, what, I don't know, 15 seconds? And it's something that I don't remember on a consistent basis. Also, like I said, if people -- if people are inconsistent in how they name things, it can cause trouble later on when you're trying to share your files. So I popped in an image here of the free expression tunnel. This is at North Carolina state University, just down the road from us. And what they basically did, they gave this tunnel, this underground tunnel, to students and said, you can decorate it, quote unquote, however you like. Well, if you walk down this tunnel, the graffiti is often inscrutable. You don't know exactly what it means. Someone else created it. There's no rhyme or reason. So it's great for its purpose, but that is not the way you should approach meta data at all. The next challenge, authenticity. I talked earlier about the importance of making sure that you can at least document the fact that files are authentic. The ones that you've gotten are what they were, the same way they were, that when you received them or if not the same, that you can tell everyone what's happened to it. So do you know where it came from? Is it the same as when you received it? Some people for different legal reasons or for corporate reasons need to be more concerned with this than others, but again, if you want people to trust you as someone who can take care of digital files, you need to consider how you can guarantee that something is authentic. And again, part of the answer to that is documentation. And the example I've got here is famous portrait of Shakespeare called the flower portrait, and it was believed to have been painted in 1609, though they had some doubts, but then in 2005 through the miracle of x-rays, they were able to confirm that it was actually painted in the 1800s. And was not con-- contemporaneous with Shakespeare. So not authentic. So Amy talked about file extensions earlier. And I don't know if you have ever tried this, but anyone can rename a file extension right in windows explorer if you're using a P.C., or in finder, if you're using a Mac. Things can happen to files all the time that make them easy -- that make them hard to verify what they really are. They can masquerade -- masquerade as something else, through human error, or software malfunction, and so there are different ways that you can decide or you can try and figure out exactly what a file is, despite what its file extension might say. One of the biggest challenges, at least from my standpoint, my experience, is misperceptions about digital preservation. And some of you may have come to this presentation today with some of these misperceptions. And here are a bunch of things that -- these are definitely not only ones, here are a bunch of things that I find are the most common misperceptions about digital preservation. First and something that you hear from both general creators, but also from maybe your I.T. support, sure, we back stuff up. At this point in the presentation, today I'm hoping that you realize that preservation isn't just about backups. And that while backups are a very important component, they're not the end all and be all, and it's not something is that you can just set up an automatic backup and let go for the future. A lot of times creators will say, no one wants to keep this stuff, no one is going to want this. Or who could possibly need this? Well, different institutions have different mandate, and so maybe you're in a position where your institution has decided that someone is going to want your stuff in the future. But even if you aren't, I can think of -- there are probably tons of instances where great men or women died believing their work was worthless, but they stuck their letters in a drawer and those letters are now priceless to us. And there are a lot of more instances of things that aren't necessarily priceless per se, but that improve our lives nonetheless. It's better to ERR on the side of caution, is what I'm trying to say. Than to assume that no one is going to want your stuff. Hearings another Wynn that I find. I'm sure my company backs up all of our files. It's very likely that your company does back up files or does provide a server storage space that is backed up. But that's not something you should take for granted. A lot of people believe that this is happening, there are backups of their hard drives, and it only takes a few minutes to make sure this is true rather than assuming. That can make all the difference, either in the short-term while you're there, or after you leave. Flickr, Google, Facebook, my Service provider, insert whatever Service you'd like, has to keep track of my data. This is another one that I've heard more and more, especially with the rise of social networking. Keep in mind, these places are still in it for business. Some of them are aware people are trusting them with their digital files, some of them make more good faith effort to preserve those than others. But there's no guarantee that they can or will keep your data. Sometimes there's information that quick through legal Services agreement that you kind of blast through, at least I blast through when I'm signing up for something. Sometimes there are details in the frequently asked questions, or you can actually find documentation online. But really, there's really no way to rely on them in order to keep your data for a long time. Especially if they're free or cheap. Just make sure that you understand what exactly you're trusting them with, and if you can try and provide for different ways to keep track of that data in different places. Like I said before, multiple backups. Here's an example. Some of you may be old enough to remember geocities. And when it shut down, this is -- geocities is a web hosting Service, and when it was shut down, after 10 years, it had over 30 million pages. You can still go to the website and see that very short but sweet note, which I have clipped and shown you at the bottom here. But there are other instances throughout history of the internet especially where things have gone away that people didn't expect to. And so keep that in mind whenever you're trusting or trying to trust people with your data. Finally, the big C word. Copyright. I find a lot of people get really scared when talking about copyright. But don't be afraid. Trust me. It's definitely a very complex area that's evolving almost on a daily basis. But one primary thing to keep in mind is that it kicks in at the point of creation. So as soon as a human fixes content in a tangible form that is readable, either directly or using a machine, and so this is something that is the first point I guess to get in mind about copyright. The second is that you can sort of waij wade your way through things, but 1923 is a good magic number sort of magic, to keep in mind, and it can often be used as a safe cut-off point. Items created before that date in the United States, anyway, are most likely in the public domain, which means you can reproduce them without infringing copyright. Gep, I said most likely, there's a lot of nuances to this, but the cheat sheet I've got here that the URL, hope family we can get that posted in chat, so it's clickable, this is an excellent copyright cheat sheet that I think lays out very nicely a way for you to -- if you have an object that you would like to make available online and would like to preserve, that will help you figure out whether or not you can do that without seeking different copyright release. OK. So I've kind of bombarded you with things to keep in mind. Not with the idea that you'll retain all of this, but rather that in the hope that you'll at least tickle your brain when certain things occur when you're embarking on this whole wild ride that is digital preservation. Jennifer will now talk to you about strategies and suggestions to help you fight the good fight, as it were. >> Jennifer: OK. So as -- thank you, Lisa. As she said, I'm going to focus on strategies you can employ to further your digital preservation efforts. So the key as you can see from this slide, is education. You need to educate the content creator, you'll remember how Lisa mentioned the digital object life cycle, and that starts when the object is created. Which is a lot of times before the digital curate juror involved. There are things that can be done by the content creator at the time of creation that will make preservation easier. So we'll talk about some of the things you might suggest to your content creators. Also, as you probably picked up on, digital preservation requires proper file storage, and management over time. Which often means that you'll need to work with your I.T. staff. That I.T. staff may not always understand what digital preservation is, so you may need to spend time educating them as to the specific needs of a preservation environment. And finally, in order to educate others, you'll need to make sure you know at least the basics about digital preservation. So he had indicating yourself is a must as well. Let's start by outlining some things that you might want to educate your content creators about. One thing it's important to make sure your content creators understand is that open file formats are much easier to keep accessible over the long term than proprietary formats. Lisa discussed what it means for file formats to be open standard, or nonproprietary. So I won't go into a lot of detail here, but I'll reiterate that their value lies in the fact their specifications are published, so that in the future if there isn't any software available to open the file, the programmer or a programmer can look at the specification and write the necessary software to make the file accessible again, which is very important. So as you can see here, I am providing a few examples of open file formats that you might want to suggest that your content creators use for various types of content. I'm not going to read them all off, but you can see there are several options for each content type. So hopefully your content creators won't find this request too limiting. The hard part of course is that using proprietary file formats has become second nature to people. We all sort of just use office because it's there, and it's what's provided. So this really becomes an effort to train people to reach for a different software tool to do their work. It's easier said than done, I know, but it is a huge help when it comes to digital preservation. So I'd recommend thinking about it if you want to make sure future users can access your creators' works. And then if you want to learn more about open standard file format and how to use -- how they're used benefits digital preservation process, here are a couple of resources you can check out. And just for your information, I have several of these resource slides in the presentation. You don't have to scribble down these URLs, because as Jennifer Peterson mentioned, all of them will be posted on the archive page for this session. So just check there and you can get these URLs. And also, I'm doing this because we have a limited amount of time, and we have a lot of contoant cover. So I can't go into too much detail about anything. But again, if you have questions, please type them in the chat and we'll answer them in chat or at the end of the presentation. So if you're creators with digitizing to create content, which was talked about earlier, there are two things that's important for you to educate them about. First and foremost, as Amy has hopefully burned into your brains, and it looked like you might have come to the table already knowing from the poll, digitization is not digital preservation. And second, and only slightly less important, is that scanning guidelines need to be put into place and followed. Scanning guidelines are really important, because they help ensure image quality is consistently high. They encourage use of open standard file formats that help broaden and maintain accessibility. And if well crafted, they can also decrease the likelihood that items will need to be rescanned in the future to maintain the accessibility. So what type of recommendations are included in scanning guidelines? Well, things you might want to include are scanning resolution and file format recommendations for various types of content, like text and photographs, and maps. Suggested hard ware configurations, software conconsiderations, quality control, file naming, scanner and monitor calibration targets, and color bars. Storing images, and recording and verification of CD-ROMS. Here are a couple of digitization guideline documents you, looks at to see what's covered. Some guidelines are more detailed than others, so you can decide what level of guidance is appropriate for your particular situation. However, the more glide answer you can offer with respect to certain areas like resolution, file format, file naming, those trieps of things, the more it will benefit the person charged with preserving the digital objects created. So we've mentioned file naming a couple of times, and this is another area that it's important to educate content creators about. If they can name files on a consistent manner and according to certain rules, then it will make moving these files into a preservation environment much easier. In general, good file naming practice includes such things as naming a file in a meaningful and unique way, and without the use of any special characters. So what does that mean? Well, naming a file and naming it in a meaningful and unique way generally means naming a file in a way that it doesn't have to be opened to identify the contents. This would require such rising the title including a version number or draft number, and perhaps including a date. And if a date is included, good practice is to format it in the standard which is year, year, year, month, month, day, day, if the files are sorted they'll sort into chronological order. And the special characters to be avoided includes space and any character that's not a letter or number with the exception of the underscore character. So here are a couple of still many session of bad versus good file names. And that's all relative, of course. So in the first example unless you knew ACBD was an abbreviation for the academic sword, it wouldn't be obvious what was in the file. And I should point out that when determining this, you should think about it in terms of someone looking at the file name in the future. Say, 10 or 15 years from now, what they know -- would they know what that abbreviation was? Also in this first file name the date isn't necessarily clear. It could be April 6th, or April of 2006, or if you were reading it as a British state, it could be June 4th or June 2004. So in its good form it's fairly clear what's in that file. It's the academic board meetings from April 18th, 2006. And there's no need to open the file to be able to tell that. For the second example, spaces are used, and the term current is used in -- as a name. Which when taken out of context, again, looking back from the future, 10 to 15 years, it don't really tell the user anything about when it was actually current. Also, there are no specifics about what team is being described in the file. Again, in its good form it's fairly clear that this file contains the ITS, Services organization chart from 2007. And you might think that this is being a little nit-picky in terms of file naming, but if you're getting files from multiple teams and they're all sending over their current team chart, without using some standard file name in convention, you'll probably get multiple files with the same name, a name such as team chart, or current team chart, and if you get more than one file with the same name, you run the risk of overwriting a file. Not to mention the risk that the creator will overwrite their own file the next time they update the chart. Also, if you automate your processes or are working with certain computer platforms, many of these run able to process file names with special characters. Which will consequently hinder any batch processing you attempt. So while it may seem trivial, good file naming can be very helpful. And I know that this is sort of hard concept to grasp in the abstract, so here are a few resources you may want to look at to help you understand what good file naming practices entail. And why they're important. So file organization is another important concept to educate creators about. And no, this is not a screen shot of my desk top. But I think we can all relate to being a little disorganized with our digital files. However, content creators are really performing a prepresenterrer is -- prepreservation management function. And in order to make sure that archival files are preserved, it's important for these files to be organized and managed. Everyone has their own way of organizing. So this is sort of a touchy subject, but one thing that could help make sure archival content is preserved is to separate working files and final files. In many cases only the final fileless actually be archival. And require preservation management function. So separating them during the creation process will save the creator time when they're trying to pull together all of their archival files to transfer to digital curator. Organization is a very personal issue for many, so keeping your recommendations at a high level, light keeping drafts separate from finals or keeping archival files separate from nonarchival files is probably best. Getting content creators to share their archival content with a digital curator is sometimes a huge stumbling block. There are various reasons, and I'll talk about a few and offer some suggestions. But this is an area where it's probably wise for you to look at the literature on the subject, because problems tend to be fairly specific. And I'm just going to be touching on the meta high level. So one stumbling block that has been that content creators don't know that someone is willing to actually preserve their content for them. Or even maybe that it's worth preserving, or maybe they're not aware that it's not already being preserved. So to combat this, many institutions have established outreach campaigns to make creators aware of who is responsible for preserving their digital content, and what content is considered preservation worthy. And what content is already being preserved, maybe through some automated processes if they happen to exist. Or to alert folks that it's not actually already being preserved. So while this doesn't solve all the problems, it does seem to raise the awareness with content creators that certain content is valuable, and if somebody else is willing to do the hard work of preserving them, creators seem to be open to sharing under most circumstances. However -- yes? >> Jennifer P.: I'm going to jump in for a second. I just wanted to check in with folks, it looks like there are some folks having audio issues, and just a reminder that you can reset your audio by going to the communicate tab, leave the audio broadcast, and then rejoin the audio broadcast. That often resets it. And if that continues to be a problem for you, the telephone number is available under the little telephone request button that's at the top of the chat box. And that should provide you with a pop-up window with the toll-free number and the access code for today's session. And I'd also like to ask, because we are probably going to be moving into a portion of the presentation where there will be lots of questions for our panelists, if you could remember to post your technical conversations if you have needs or comments about the technical environment, please post those to the Q and A panel. It has a little question mark and the Q and A, and that way we can keep chat open for questions specific to our panelists. We've got plenty of time to address questions and I know that this is a lot of information for folks. So hopefully we can get to some of your questions and thank you to our panelists. I know that people are response right now, and are absorbing lots of information. So thank you, Jennifer, I'll let you continue. >> Jennifer: OK, thanks. So another stumbling block that we run into is that digital curators would like to receive certain context wall information about a file along with the file when a submission is made, and this makes a submission process something that content creators view as cumbersome, so they avoid it. And one approach to this prop is to -- problem is to automate and streamline the submission process as much as possible. This requires technical resources, which not all of us have, but if you do have them, and can implement the more the submission process can be built into the content creator's work flow and the more contexts Walt and other information about a file that can be gleaned by either examining the file or extracting this information from the file using some software, instead of asking the content creator to type it in, the more likely the creator is to participate in the preSerb invasion process and supply their content. A final hurdle that many face is that creators want to retain some control over their creation, many times making the file immediately accessible to the creator themselves, the public, and/or a limited community, depending on what the content is and what's appropriate, is useful so that they Corey -- the creator knows the content hasn't gone into a black hole. Also, being open to restricting access to the file and/or replacing the file with updated version, if the creator wants to make changes or finds out they have made a mistakes, can increase participation and of course when I'm saying to restrict access and replace files, I'm talking about to the derivative copy you actually make accessible. Not to the the master in your preservation environment. So I want to be clear about that. The replacement would likewise only be of the derivative copy, not the preservation master. So just note that you can be a little more flexible with your access copies than you can be with your preservation copies, and that may be enough to satisfy a content create yor. And the updated file if did you take an updated version of a file which we should, would simply be added to the preservation environment as an additional preservation master. And I say flexibility in working with content creators regardless -- with regard to access is very important, because if you remember, the alternative to working with content creators on whatever the issue is that's preventing their participation in the digital preservation process is not getting their content at all. And that's missing the opportunity to preserve it. So it's good to be a little flexible sometimes. And just to reiterate, what we can educate content creators about, they can benefit from your input on what nonproprietary file formats are, and when to use them. Creating and following digitization guidelines, using file naming conventions, organizing their files, and the importance of sharing their files with digital curators. Here I'll also mention they might also be interested in how to curate content themselves. If there's content that's important to them, but not technically preservation worthy, by whatever rules have Ben happen to be in place, so you may want to be prepared for them to take a bigger interest in the process once you open the lines of communications. So like with content creators, digital curators also need to collaborate with I.T. staff. And as I mentioned before, sometimes I.T. has a different understanding of what digital preservation means than a digital curator does. So it's important to educate your I.T. staff about exactly what you mean by preservation environment. One component that you might mention is the need for redundant storage that's spread out across various locations. And Lisa talked about this a little earlier. This is just that distributed storage distributedback backups. And this is a basic tenant of business continuity and disaster planning, so it can't completely foreign to your I.T. staff, and at a basic level after this really requires is that you keep your data replicated in different geographic locations so if there's a disaster at one location, you can still access your data from another location that won't have encountered the same disaster. Where it gets trickier is that I.T. is usually focus order treating everyone's data the same. So making sure they understand the difference between managing preservation masters and other files is very important. And I'll delve deeper into this, but at the surface this includes having them avoid riskier storage media for the various replications, as preservation files are meant for long-term storage and many storage media have shorter life spans. And/or are somewhat unreliable. So making sure that they use a very stable and long lived media is very important. So here are a couple of good resources about storage media, and the risks associated with various types of storage media. And the need for redundant storage for preservation. And you can read these and maybe even share these with your I.T. staff. So as I mentioned before, I.T. is usually focus order treating everyone's data the same. Because in general, it is the same. However, preservation storage requires more than the norm. Many I.T. folks think that backing up files equals preserving them. And this as you hopefully have already gathered, and I will reinforce more shortly, cannot be further from the truth. Backups are simply a copy of the data kept for a specified, and that's usually short time frame. So that you can recover if you actually delete or change a file. Preservation on the other hand requires being able to access and use files over the long term. It requires a great deal of file management. What kind of file management? Does it require? Well, best practices indicate that files need regular virus checking, checksum, a hash, and is just basically the process of verifying that an object hasn't been changed. Best practices further indicate a need for storage media monitoring and refreshing. And also those backups that we've talked about. In addition, preservation storage environments should be secure, and as I've already said, distributed. And it should also have an audit trail as Lisa discussed, for all file activities. Finally, it must also allow for the implementation of a preservation strategy. And preservation strategy is just a plan for maintaining access to the files over time with little to no loss of content, functionality, and look and feel. An example of preservation strategies that you've probably heard of are migration and emlaition, clearly preservation -- clearly preservation management is way more than just backups. And so explaining that to I.T. is going to be very useful to you. I'm going to go into a little detail about file management functions, there are resources to dig deeper on your own. I would suggest the NASCIO is a good one to share with your I.T. staff, was NASCIO is an I.T.-based organization. Just to recap, some of the things you can educate your I.T. staff about include using the most reliable storage media, using secure sthiefers are geographically disprint, performing virus checks at set interval and making sure there's an audit trail for all activities. So in order to be able to educate others, you need to educate yourself about digital preservation as well. Step one in this education is to learn about what needs to be done when you as a digital curator receive a file. This is intertwined with much of what we've already discussed, and if your content collaborators -- creators collaborate with you, your work will be much easier. Initially what you'll need to do is to run a fix ti check. Preferably on the file in the creator's storage space. Because what you're trying to do is set a baseline for the file that you can compare against as you move and manage the file in the future. As you may remember, the fixiTY check is the process of verifying a file hasn't been changed, so you if you set a baseline early in the process, then you can use it to identify if and when -- if or when a file change is. In addition to fixiTY, you'll need to run a virus check to make sure you aren't acquiring tah file that can contaminate your archives. And you will also need to make sure that the file naming conventions have been applied, and that the metadata you need to manage the file properly is available to you. And if these aren't in place, you'll need to address that by changing the file name. If you did change the file name, you'll need to record in the metadata the original file name. So make sure that you don't lose any information through your edits as well. And if you need to add metadata by using some kind of extraction or file examination, you'll need to at that -- add that to your metadata file that you associate with that particular content. Also, what utility need to do is determine the copyright status of the file content. So you can make sure that you comply with the law in your activities. It's also good practice to have an agreement with the person submitting the content, specifying your responsibilities with respect to the file. By that I mean what type of preservation you're going to be signing up for. Are you going to just do bit level preservation, which basically means you'll give them back exactly what they gave you without making sure that they can open and use it? Or are you going to give them a usable file and do that for some preservation strategy? So these are just a few of the high level activities involved when a file is first received, and this initial processing is commonly referred to as ingest, so you may have heard that term and these are some of the functions that are included in the ingest process. So these are a few resources that go into more detail about the ingest process. You may want to take a look at these. The OIAS reference model is sort of the holy grail of digital preservation. A familiarity with it is pretty much a must. So I would definitely suggest taking a look at that. So metadata is another important part of digital preservation that you should become familiar with. And Lisa mentioned earlier, it's something we can't epidemic emphasize too much. There are several types of metadata, there's descriptive metadata, technical metadata, and structural metadata, etc. And they each serve very important roles. Overall, properly employed metadata organizes information. It makes things accessible and discoverable, which is very important, and it helps curators with their digital preservation tasks. From a curator's perspective, it's important to know the software and hard ware used to create a file so that we can through whatever preservation strategy we employ make that file accessible in the future when that software and hardware may no longer be available. Other examples of helpful metadata for curators is information about the legal status of the file, who holds the copyright, what permissions the curator has been given by the copyright holder, maybe at what point the copyright might expire. The preservation actions that have previously been applied to the file, fixiTY information for verifying that a file hasn't changed over time, without this type of information a curator would be working under very difficult conditions, and in some cases, like with respect to the fixiTY information, they would be hard pressed to substantiate the file's integrity overtime. So you, see that metadata is vital to the -- to a curator, and you may hear, I'm not going to talk about some of the various -- you may hear about premise or Mets, all of these are just schemas for managing the metadata. So as you can see, metadata is a huge subject, and I've really only scratched the surface. Here are some additional resources on metadata that can help you understand what it is and how it's important in the digital preservation process. So I spoke previously about at of the actions that are involved in preservation management. And these are all things that a curator is responsible for. Yes, you may work with your I.T. staff on them, but ultimately it is the curator who bears the responsibility of making sure the files received are preserved at the level agreed upon. And one thing that curators can count on is that technology will change. So consequently digital preservation requires constant monitoring and management of these files. And there are many preservation activities that are more advanced than we have time to delve into here. But some of the areas we haven't covered that you may want to familiarize yourself with are the two prominent preservation strategies. Which we've touched on a little bit, file format migration, which is transforming a file from one file format to another. And this is a very popular preservation strategy because it is a fairly easy with the tools that are out there to accomplish the strategy, and it's sort of a wait and see kind of strategy, you wait until a file format is becoming obsolete, and then you would apply the migration strategy. You can the normalize files as Lisa mentioned earlier, which is a transformation of a file from one format to another as well. You can do that as part of the ingestion process that I discussed earlier. So it does haven't to be a wait and see process, but a lot of times it is. The other preservation strategy is emlaition. And basically that is recreating the environment required to access a particular file format. And this is in most case as lot more difficult. It's a lot more -- requires a lot more spectacle expertise. So it is much less used strategy, but it is a very valid strategy if you have the opportunity to use it. Other advanced concepts you may wack to -- want to look into involve those with persistent identifiers. And persistent identifiers are permanent location independent and globally unique identifiers for a resource. And file format validation which is the process of determining the level of compliance of a digital object to the specification for its purported format. And that is sort of that -- what both Amy and lease youa talked about earlier with respect that file extension accurate. So there's tons of literature on all these concepts so you couldn't have any trouble learning about them if you're interested. And to get you started, this slide has information about a few tutorial type sites you can look at to learn more about digital preservation. And the last link, and this is full disclosure, is to a site that the three of us here actually created to help government employees in our state learn about digital preservation. So we highly recommend that one. And so to summarize what you as digital curator need to know about to get started is the best practices for ingest or acquisition procedures. How to make the content accessible through metadata. Both in the near and long-term. And what is involved in the process of preservation management. And I know it's a lot of information, but hopefully we have at least given you a start, something to start with today, and maybe raised your interest level. So now we -- now that you've heard our pitch, we're going to ask you the same question we started the session with one more time, and that is, do digitization and digital preservation mean the same thing? And while you're voting, and you vote the same way, it's still -- it's the red check mark over the chat box. And while you're voting we want to thank you for hanging in there with us, for this lengthy session, and now we're going to respond to your questions. So feel free to type in any questions you have into the chat and we'll do our best to field them. >> Jennifer P.: This is Jennifer Peterson, I want to jump in and thank you for the formal presentation, and I look forward to more questions being answered, and I have to confess that I am not a digital archivist, curator, so I'm going to let Amy lead the moderation of the questions. Feel free to let me know if you need any assistance, but I know there were a lot of questions that you all have answered already, and if there are others that you don't get answered, go ahead and post those again, and Jennifer and Lisa and Amy can answer. Presenter: I see questions about migration, I would mention, I can give you a little more information about that. It's something that we didn't want to go into too much because this is supposed to be kind of introductory, an overview for beginners. But migration is something that we've been working on recently here, and basically there are a lot of things up in the air about it. You need to sort of decide what different format attributes you might possibly be willing to compromise on, because a lot of times when you migrate you are going from something that maybe has more features or fewer features, and that's something that you're going to lose as soon as you transfer to a newer version. The idea is that you try and make the best of it, you keep what you can keep, and always save your original version. So that way that you can then go back to those original bits, 1s and 0s and possibly migrate to a different -- do a different version later on if necessary. And so migration is sort of an ad hoc process for a lot of people at this point. But it's definitely better than the alternative, which is not being able to give people access to your content.s I don't know if Jennifer wants to add anything to that? No, she's Shane Bauering her head no. If you have any more specific questions we'd be glad to tackle those. >> I saw a question about, is there a digital preservation management software that can assistN an organization with managing digital preservation. Are there digital preservation tools and platforms such as Microsoft share point or Lodus notes? And I'm going to start the answer to this question, but I'm hoping one of my colleagues can jump in. There are software tools that are available, and depending on your institution, there will be more functional than others. So share point could probably be structured to do some of the digital preservation management that wove been talking B. at least at a very basic level, but there's nothing that exists at this point that would do all of the sinks things that we've talked about. And there are lots of grant funded projects that are trying to build these tools, and large institutions that have managed to build systems, but they tend to be very specific to their institutions. So university of Texas at -- not Austin, I want to say University of north Texas, is doing some really interesting work in this area, but again, they've got programming staff and lots of resources to dedicate to this kind of work. So I would have to say at this point there is no easy solution to this answer, and that's why we are approaching it with the -- with the idea that there are individual things that you can do as a curator or creator of content, but there's no and all, be am solution that you can purchase and install on your computer. And I don't know if any of my colleagues have anything they'd like to add to that. >> Nope. >> OK. So then we have another question about differentback backup formats. And there's a discussion going on about you USB and portable hard drives, and I know there was a question about CDs and DVDs. It looks like Lisa has been answering it on chat, maybe she wants to give us more information? >> Lisa: Sure, it's a lot quicker to say it out loud. The idea is to minimize your move. You're going to have month-to-move regardless of which storage you choose. You can think about DVDs andC. CDs -- you're going to have to move them quickest so they're probably not test test test best option unless they're all you V USB sticks the same thing, not only will they not lost as -- last as long, but their portability is why they use -- we use them, and this could be a deficit. Then you -- then you get things like regular hard drives or external hard drivers, which can be better choices, those are things that are more easily managed, and you can get them really cheap now. You can also put them in multiple locations, which is another great thing. And then I think someone is mentioning tape storage. That's when we're moving to things that take a little bit longer before they deteriorate or would require losing your content, taped storage is another -- is an option that a lot of people use, and still use, it can be difficult to access things off of tape, but it's definitely something that a lot of I.T. departments still have a -- are proponents for, and in that case it may be one of your best options. So, yeah, I think someone mentioned a cheap idea, try and look into external hard drives, something like that, it's something that you definitely don't want to trust all of your content to, but can be used as some of those multiple backup storage locations. I hope I've gotten what the meat of what you are -- all these people were originally asking about storage. >> Amy: Thanks, Lisa. I think there's another question here that maybe Jennifer could answer, and that is related again to storage, and it's what are the best low-cost ways to store other than shared drives? >> Jennifer: Well, you can always -- it's a low-cost -- you give up some control, you can always outsource and use a vendor for storage. It's usually cheaper because they're doing it in volume. You do lose some control because it's not on your premises anymore. But if you research well, you can probably find some that meet your needs, and are at a price point that can work for you. >> There's a lot of questions about checksums and running checksums and how do we do it, and I can answer this on a basic level. There are software programs that are free that you can download, and you can batch run checksums. So basically what we're saying is, you take your files, and you run them through this program and would output these strings of letters and numbers. And they are unique to that file. So if anything were to happen to that file and you were to run it through this checksum software again, that string of letters and numbers would change, so you would know at least that that file will somehow changed without your knowing, and you could take one of your backup files and replace it. So I don't think that I.T. pros have to handle this. I think it's actually very easy to do manually. If you wanted it to happen every day, you'd probably want your I.T. staff to run some sort of -- write some sort of program that would do this automatically. We tend to run our checksums any time we move a file, or if we open a file, if anything happens to that file, we will run the checksum again to ensure that that file is exactly the same. And this is probably the best case scenario if you're in a smaller institution and don't have a lot of I.T. support. We really don't. We are three librarians and we're probably a little more comfortable with technology than some, but we definitely are not I.T. professionals. So I think we've provided a few different examples, if you do a search on files, you're going to find that program, and what you're going to get is a list of different all Gyor it's manies you can use. There's MD5 and SHA1. Those are the names of the all Gyor it's manies. If you use MD5 the first time, you should always use it. Does anybody want to add anything? OK. So then we've got another question here that should make us -- Lisa wants to ask something. >> Lisa: I was going to chime in about a different question. The question about the best file formats to avoid migration. And I guess that goes back to the idea of open files formats versus proprietary, but really it's predicting in the future what's going to go away and what isn't. And there are even some people who say we shouldn't even be concerned about this. But because we can't predict generally the idea is to go with file formats that are open, number one, number two, in common usage if you can't do open, and then after that you know, just keep in mind that in the future you may have to migrate, but that's the way to diminish how often you're going to have to migrate. >> Amy: And I think -- there are a incumbent questions about checksum I want to follow up on. One is if the checksum number does change, doesn't that mean the file is already corrupted and it's too late? I think Emily asked that question. Yes. You already have a backup of that file, why thely, sour going to go to your backup and replace the corrupted file with your backup. If you don't have a backup, that could be a problem. That's why we suggest having multiple copies in in multiple locations. Now, if you have opened a file and edited that file purposefully, your checksum is going to change, because that checksum is responding to the 0s and 1s in your file, and the order that those 0s and 1s appear in. So if you change something in your file, the 0s and 1s are going to change as well. So you need to rerun that checksum, get the new number and letter combination, and then you will record that number and use that same algorithm in the future. So ideally you're running checksums on files that are archival, it may not make a ton of sense to be running checksums on files that you're currently using. So anything that has moved past use to storage, or just to preservation, those are really the files you want to focus on for your checksum application. Does that make sense? I hope so. I think we've got a big question here that I'm going to let -- Jennifer has something she wants to add. >> Jennifer: Yeah. I just wanted to si that you can change the file name without impacting the checksum. So if -- as part of your file management you do find that you need to change the file name, that will not impact your checksum. So are you still safe there. So just want to throw that out there. >> Amy: OK. So the next question that someone brought up, and I think this is a -- the kicker of the day, it says so describe an ideal preservation set-up. On soirvetion backed up, in some other format. I'm going to let one of my colleagues -- looks like Jennifer has this one. >> Jennifer: I'm the lucky soul. So basically that OAiS reference model sort of lays out sort of at a very high level, because it is a reference model, the ideal preservation set-up. But, yeah, you're going to have all of the components that we talked about with the basically a storage piece, you're going to have an access piece, you're going to have this ingest function, there's a whole process that a file will go through that is outlined in that OIAS model. In terms of just the infrastructure, yeah, you'll definitely have your content stored on servers, you'll have some administrative functions on servers, you'll have your management run, you'll have sort of a holding area at the beginning sort of a staging area for your ingest before files go in. You'll have a quarantine area for files that are impacted with a virus. So there are a lot of different components to the ideal preservation system. And truly folks have employed preservation systems, but I don't think anybody has actually said that what they have is truly the ideal. So I think we're all still working toward it. But it is a very complex system that does have to take into consideration that you have different aspects that need to be separated from one another. So hopefully that somewhat answers your question. >> Amy: Another question about checksums, and that was where is the check stum information recorded, it is done automatically? So I was just in the process of answering this one. Checksums -- wait a second. What was the question again? So the checksums that have to be kept separately, so there are a lot of different ways you can do this. You can keep them in a spreadsheet, in a database, or in a content management system, and you would associate that checksum with the file probably based on the file name. So that way in the future you can go back and look at it. So it's something that can be done very rudimentary, manually, or you can try and get one of these more complicated systems to keep your metadata and your other checksum -- and your checksum and associate them more on a batch basis with your files. >> Amy: I've got another question here I'm going to give it to Jennifer, and it is, what's the best way for the public to access your files? Your digital preservation files? >> Jennifer: All right. So truly you don't necessarily want the public accessing your preservation files. You would want them to access a derivative copy. So what I mean by that is, your preservation file, for example, for an image would eventually be a TiFF file, which is huge, and if it's done at high resolution, you'll -- it will have a lot of detail in it. When you make a file accessible to the public, you probably want it to be a JPEG, a compressed file. Much smaller than a TiFF file. And they lose some of the crispness of the image. But that is the file that you would want to make publicly accessible. And in a lot of instances it's for like organizations like archives, it's because they would be sewing, or making money off of selling the actual image itself. So you don't want to give things away. So you want to use a much less quality so that image for your access copy than you actually have in your preservation environment. When you're sharing it with the public, it's sort of like a carrot to get them interested, and then when they buy it, then you would get them the actual high-quality image. And you could do that in any number of ways, whether you would burn it to a CD, but it wouldn't be giving them access to your preservation environment. That is an environment that you really want to keep secure, and because you just don't want people to have access and accidentally as Lisa was talking about, do something, you know, that could damage a file. So it's best not to give them access to your preservation masters, but instead to make a derivative and give them access to that. >> Amy: I've got a bunch of questions about conversion of files, and are there any software tools that do that. So photo shop is the one everybody knows about, but I'm going to tell you my favorite batch converter out there is free, it's Microsoft friendly, P.C. friendly, and it's called earthenview. I just sent the link, but I'll send it again. IRFANview. It does amazing things for no money at all. So save yourself $500 and use IRFANview for your batch conversion. There was also another question about metadata, and how do we capture metadata. There are lots of ways to do it. If you're not -- if you don't have I.D. support, that can do -- that can do this kind of work and automate some of the processes for you, obviously you'll have to do it manually, and you can do it for free with there are lots of officelike programs out there that have Excel-like programs. So that would be my suggestion, is look at something like open office. It's not just open in the fact that it's free, but it's also considered an open nonproprietary format. So you can store your metadata in a cell-based program like Excel, but -- I don't know what it's called in open office, if anybody does, throw it into the chat screen. And you can go for it that way. Now, my ultimately my suggestion would be that you save your metadata in a text file, a . TXT file, which is just ASCII. That is about as open and nonproprietary of format as you can get. So you can store your metadata in content -- any of those files if you want, but your digital preservation storage format should really be something like . TXT or XML, something that is ASCII-based or possibly even unicode based. But the more -- the base -- I want to say the more basic Sari Diseth-Clarke the basicker of the file format the better. I hope that answered Harvey's question, and let's see what else we've got here. I've heard that IRFANview can strip out embed metadata. It probably can, it can also embed metadata for you. So definitely play around with it. Another nice thing about IRFANview is that it can move your files without changing the date modified. Which is really good because if you haven't modified a file, you don't want that date modified to change for authenticity purposes. It goes wack to this idea of prove nens. You want to be sure that you're recording any change to a document, and so if you're moving files and that date modified changes, that's something you'd have to record, and explain why it changed. And if you haven't actually done anything to it, that that don't make any sense. I don't know if anybody wants to answer that question better than I did. Nope? OK. We're all sitting in a room together here and looking at each other as we're talking. >> The only thing I have to add about that IRFANview point, I mentioned this in chat, I don't know if you were talking about unintentional stripping out of embedded metadata, and the only time I've had that happen with IRFANview is converting a PDF, but that is something that happens whenever you convert the PDFA from metadata -- some metadata is stripped out. I've also had unintentional metadata stripping using photo shop. So it's something that sometimes happens regardless of what software program you're using. >> Amy: A couple of questions about file format. I'm going to throw these to Jennifer. One of them is when is PDF a good preservation format, and also would you want to store two copies of a TiFF file. >> Jennifer: So when is PDF a good preservation format? And that is any time that the functionality isn't lessened by putting it into PDF, you don't lose content, you don't lose functionality expurks don't lose look and feel. That's pretty much how we judge a preservation format, whether it's appropriate after you go through the open standards discussion. So PDF, while it is technically proprietary, they have published the standard, so it is considered an open proprietary standard, so it's one of the hybrid ones, there are not many, but so if you can still get all of the value out of the file by storing it as a PDF, I'd say that is an appropriate preservation format. And then as for the two Tiff files, was that it? And that's one for basically making available to folks and one that should keep as your preservation master, I would say you probably don't want two TiFF Rosetta Stone, you probably want a TiFF and a lesser file, like a JPEG, or something. You don't have to -- you're not going to hurt your preservation file if you convert to something else. Your preservation master will remain intact. So you don't have to worry about trying not to use the preservation master for anything. If that makes sense. You can totally just keep a single preservation master and not worry about keeping a duplicate of it as your usable copy. You should keep duplicates in terms of your distributed backups, but that's the only reason you would have the duplicates. >> Amy: There's a question here that I'm going to ask Jennifer -- Lisa to answer, and the question is, you mentioned automation in your educating content creator slide, actually was that Jennifer? OK. Jennifer, can you expand on what that is exactly? Automation? You mentioned automation in your educating content creator slide. Can you talk about what you meant by that? Do you know -- do we need more information about what that means? OK. Whoever sent that, can they reword that question? We'll get back to you. I have another question here that was, any group set up for this type of forks I'm assuming you mean digital preservation information exchange? And for question -- we do have a list serve, it's a pretty low traffic list serve, but you're welcome to go to our site, digital preservation.NCDCR, maybe somebody can type that in,.GOV, and join the list serve. There's also lots of good list serves out there, and I'm going to try to rack my brain to come up with a couple of them. But best practices exchange is a really good place to go. And we're going to come up with a couple more, and we'll type them into the chat screen. I'm going to pop off here and see if I can read more questions if anybody -- if Jennifer or Lisa has one they want to answer, go ahead. >> We got more clarification on the automation. In terms of the metadata submission, so, yeah, if you can streamline the process so that the metadata -- the creator doesn't have to type in all the metadata, what you would do would be to use a tool that extracts the metadata, and it's just software tools that are available that would extract the metadata so that you could pull whatever the software that actually was used to create the file store. So it will store a lot of forks like the creation date, the versioning, all of the information about the creator, time of creation, various things, and if it's digitized, a lot of times a system will store the resolution information that came along with the image when it was transferred in. So that type of information instead of asking them to actually type it in again, it's already embedded in the file. And so you can automate to go ahead and pull that information out. Also if you can work things into their work flow, so it's not always possible, but if there are like shared driver locations where you can have them place content, so that it just becomes sort of where they store something when they're done with it, and you have access to go grab it, that's an ideal situation. You're not really asking them to take an extra step, and e-mail it to you, or FTP it to a site, or burn it to anything, and send it to you. It's just -- the easier you can make it, the more likely they are to participate, really. >> Amy: OK -- . >> Lisa: This is lease yawrks I'm seeing a question about what's the best way to make the -- your JPEG Rosetta Stone, those access copies available to your patrons. And I think the suggestion was linking from your library catalog, and so I know some of our listeners aren't from libraries, but if you are, or even if you're not,s that one way you can do it, but a lot of people either in addition to linking from a catalog or instead of leadershipping from a catalog, use different open source or closed source publishing content systems. One that you've probably heard of which is -- is not free, is content DM, but there are also others that you can use that will -- that are made more for this type of presentation of images with metadata for the public. One that is also a little bit up and coming right now is OMECA, and I've popped a link in there for that. And that is free, and it has customizable options, but you can use it as a box, and systems like that are designed to be sometimes they're called web exhibit sites, and those are ways that you can push your JPEGs out to users, and then there's always just the plain old HTML page, which is low tech, prepared to some started today, but that's another way that users can access. The catalog is, however, often used as an entry point to these digital objects because many people using libraries are already familiar with and count on that, and also sometimes those digital objects are duplicates of something you may have in print -- so that I assume JPEG is something we should avoid. I am unsure of that. I don't think that's necessarily the case. But it looks like Lisa can answer this question.